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1 . The Compiler System 



This chapter describes the components of the compiler system and how to 
use them. 



1.1 Overview 



The components that comprise the compiler system and the task each per- 
forms are summarized below: 
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Task 

Write & Edit 
Programs 



Compile, Linic, & 
Load Programs 



Debug Programs 



Profile Programs 



Optimize Programs 



m ^ 

m 



Tool 



UNIX System Editors 




The Link Editor & 
the Compiler Drivers 
mMmmmmmmssmmmm 



The Symbolic 
Debugger 



The Profiler 




The Optimizer 
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1.2 The Drivers 



Intelligent programs called drivers actually invoke the following major com- 
ponents of the compiler system: the macro preprocessor {cpp),. the com- 
piler {cc or /77), the assembler {as), and the link editor {Id), This sec- 
tion gives an overview of driver operations and commands. 



1.2.1 Driver Commands 

The commands /77(1) and cc(l) run the drivers that cause your programs 
to be compiled, optimized, assembled, and link edited. 

Each command knows the appropriate libraries associated with the main 
program and passes only those libraries to the link editor. 



1.2.2 Files 

The drivers recognize the contents of an input file by the suffix assigned 
to the filename, as shown below. 
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Suffix Description 



.e efl source 

.r ratfor source 

.s assembly source 

.i source is assumed to be that of the 

that of the processing driver. For 
example: 

f77 -c source.! 

source. lis assumed to contain FORTRAN 
source statements. 

.0 C source 

• f Fortran 77 source 

.u ucode object file 

.b ucode object library 

'^ object file 

.a object library 



NOTE: The assembly driver as assumes that any file, regardless of the 
suffix, contains assembly language statements; as accepts only one input 
source file. 



1.2,3 Operational Overview 

Figure 1-1 on the next page shows the relationship between the major 
components of the compiler system and their primary inputs and outputs. 

Note that FORTRAN uses preprocessors (see Figure 1-2) that the other 
languages do not use. For more information, see the efl(l), ratfor(l), 
and m4(l) manual pages in the IRIS-4D User*s Reference Manual. 
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Figure 1-1. The FORTRAN Oompiier System Driver 
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Preprocessor 
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FORTRAN 
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Figure 1-2. The FORTRAN Preprocessors 
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1.2.4 Default Options 

At compilation, you can select one or more options that affect a variety of 
program development functions, including debugging, optimization, and 
profiling facilities, and the names assigned to output files. 

Some options have defaults, which apply even if you don't specify them. 
For example, the default names for output files are filename. o for object 
files, where filename is the name of the source file; the default name for 
executable program objects is a, out. The following example uses the de- 
faults in compiling source files foo.c and bar.c. 

% cc foo.c bar.c 



1.2.5 Compiling Multi-Language Programs 

When the source language of the main program differs from that of a sub- 
program, you should compile each program module separately with the 
appropriate driver and then link them in a separate step. You can create 
objects suitable for link editing by specifying the -c option, which stops 
the driver immediately after the assembler phase. For example: 

% cc -c main.c more.c 
% ill -c rest.f 

The Figure 1-3 below shows the compilation control flow for these two 
commands. 
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Figure 1-3. Compilation Control Flow 
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1.2.6 Linking Objects 

You can also use a driver command to link edit separate objects into one 
executable program. The driver recognizes the .0 suffix as the name of a 
file containing object code suitable for link editing and immediately in- 
vokes the link editor. 

For more information on the link editor and on specifying link libraries, 
see the "Link Editor" section of this chapter. For a detailed listing of the 
default libraries see the cc(l) and /77(l)manual pages in the IRIS-4D Us- 
er's Reference Manual. 
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1.3 Compilation Options 



The tables on the following pages summarize the options you can specify 
for the compilation phases, which include the preprocessing phase through 
the assembly phase (Figure 1-1 shows these phases); the options summa- 
ries are divided into the following major groups: 

• General options. 

• Debugging options. 

• Profiling and compiling options. See Chapter 2 for information the 
advantages of profiling and debugging, and how to use the profiler. 

• Compiler development options. 

NOTE: The tables list only the most frequently used options; they don't 
list all available options. See the cc(l) and/77(l) manual pages in the 
IRIS-4D User's Reference Manual for a complete list of options available. 
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1-3.1 General Options 



The general options are listed in alphabetical order in the tables that fol- 
low. 



Option Name 



Purpose 



C) 



~c 



Prevents the link editor from linking your program 
after compilation. This option forces the compiler 
to produce an .o file even when you compiler only 
one program. 



-C 



C driver only. Used with the -P or -E 
options. Prevents the macro processor 
from stripping comments. Use this option 
when you suspect the preprocessor is not 
emitting the intended code and you wish 
examine the code with its comments. 



-C 



"E 



-D name 

-D name=def 



Pascal and FORTRAN drivers only. Generates 
code that causes range checking for subscripts 
during program execution. 

C driver only. 

Runs only the C macro preprocessor and sends 

results to the standard output. Specify also -C 

to retain comments. Use -E when you suspect the 

the preprocessor isn't emitting the intended code. 

Defines a rmcroname as if you specified a 
^define in your program. Unless you 
specify a definition with name=def , the 
compiler defines the name to be "1". 



( 
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Option Name 



Purpose 



~G num. 



num is a decimal number that specifies the 
maximum size in bytes of an item to be placed 
in the global pointer area. The default is 
512 bytes. You can raise or lower num to 
control the number of data items placed in these 
sections. 



Note: If you receive the link editor message 
"Too much data in the gp area . . .", you must 
recompile the module lowering the value currently 
specified for num. 



-1 dirname Compiler searches the current directory, 

dirname, and the default directory, lusrlinclude, 
in that order for the include file. 

-I When specified in addition to -I dirname, 

the compiler searches only dirname 
file (does not search the default directory) . 

-nocpp Do not run the C macro preprocessor on FORTRAN, 

C and assembly source files before processing. 
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Option Name 



Purpose 



-o filename Assigns the name filename to the program 

object. When used with the -c option, 
tells where to leave the .o file. 

The default filename is a. out. 

-P Same as -E options, except puts results 

in an .i file. Specify both -P and -C 
to retain comments. 

-pi or -p Permits program counter (pc) sampling. This 

option provides operational statistics for use 
in improving program performance. See 
Chapter 2 for details. 

Note: This option affects only the link editor 
and is ignored by the compiler front-ends. 
When link editing as a separate step from 
compilation, be sure to specify this option if 
pc sampling is desired. 

-S Similar to -c, except produces assembly 

code in an .s file instead of object code in 
an .o file. 

-Zv Issues a warning message when the com- 

piler finds a non-standard feature in the 
programming language of your source 
program. 

-U name Overrides a definition of a macro name 

that you specified with the-D option, 
or that is defined automatically by the 
driver. 

-V Lists compiler phases as they are executed. 

Use this option when you suspect a phase 
isn't being run as you intended. For example, 
the option might reveal that you failed to specify 
a library required by the link editor. 

-V Prints the version number of the driver and 

its phases. When reporting a suspected 
compiler program, you must include this 
number. 

-w Suppresses warning messages. 



( 
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1.3.2 Debugging Options 

The table below lists the compiler options available for debugging source 
code using dbx. 



Option Name 



Purpose 



-gO* 



Produces a program object without debugging 
information. Reduces the size of the 
program object and should be used when 
debugging is no longer required. 
Retains all optimizations. 



-gi 



-g or -g2 



-g3 



Permits accurate, but limited, source-level 
debugging. This option does most 
optimizations. 

Permits full source-level debugging. These 
options often suppress optimizations that 
might interfere with full debugging. 

Permits full, but inaccurate, debugging on 
fully optimized code. Debugger output may 
be confusing or misleading. Specify this 
option for programs that malfunction only 
after you attempt to optimize them. 



* Default option. 



1.3.3 Profiling Option 

The compiler system permits the generation of profiled programs that, 
when executed, provide operational statistics. This is done through com- 
piler option -p (which provides pc sampling information) and the pixie 
program (which provides profiles of basic block counts) . See Chapter 2 
for details. 
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1.3.4 Optimizer Options 

The table below summarizes the options available for program optimiza- 
tion. However, to fully understand the benefits of optimization and how 
the compiler achieves optimization, you should read the "Optimization" 
section in Chapter 2 of this manual. 

Option Name Purpose 

-O or -02 Global optimization. Optimizes within the 

bounds of individual compilation units. This 
option executes global optimizer (uopt) phase. 

-OO No optimization. Prevents all optimizations, 

including the minimal optimization normally 
performed by the code generator and assembler. 

"Ol * The assembler and the code generator perform 

as many optimizations as possible without affect- 
ing compile-time performance. 

"^3 All optimizations, including procedure inlining. 

This option must precede all source file arguments. 
With this option, a ucode object file is created for 
each source file and is left in a *.u' file. If the -c 
option is not present, the runtime startup routine, 
runtime libraries, and ucode versions of the runtime 
libraries are linked, as well as newly created ucode 
object files and ucode object files specified on the 
command line. If the -c option is present, only 
the newly created ucode files and those specified on 
the command line are linked. Procedure inlining is 
done on the resulting linkred file. This file is compiled 
and the object file is left in *u.out.o' by default. 
If-ko output is specified, the object file is left in 
output with a suffix of '.o'. 

* Default option. 



1.3.5 Compiler Development Options 

In addition to the standard options, each driver also has options that you 
normally won't use. These options primarily aid compiler development 
work. For information about how to use these options, consult the cc(l) 
and /77(1) manual pages in the IRIS'-4D User's Reference Manual. 
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1.4 Including Common Files (Definition Files) 

When you write programs, often you have common definition files that 
you share among a program's modules. Common files define things like 
known constants or the parameters for system calls (for example, the files 
that define the object file formats). 

Because globally shared things should go in one place, you need a way to 
put these things in a common place. Definition files (often called header 
files in the C programming language) let you share common information 
between many files in a program. 

Many people call these files ^include or "header" files. These files have a 
"./z" suffix. Typically, a manual page from the IRIS-'4D User's Reference 
Manual tells you to include a specific definition file. 

Each supported language handles these files the same way, and you specify 
these files in your program's source code. 

NOTE: If you intend to debug your program using dbx, you should not 
place executable code in an include file. The debugger recognizes an in- 
clude file as one line of source code; none of the source lines in the file 
appears during the debugging session. 

To specify an include file in your program, put a line like this at the be- 
ginning of your program: 

#include "test.h" 

You can include files in your program source files in either of two ways: 

1. In column 1 of your source file, type: 

# include "filename " 

where filename is the name of the include file. Because you placed 
filename within double quotation marks (")» the C macro preprocessor 
searches in sequence the current directory and the default directory 
/usr /include. 

2. In column 1 of your source file, type: 

#include < filename > 
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filename is the name of the include file. Because you plsiced filename 
between the greater-than and less-than signs (< >) , the C macro 
preprocessor skips the current directory and searches only the default 
directory I usr I include for the include file. 

C, FORTRAN 77, and assembly code can reside in the same include files, 
and then can be conditionally included in programs as required. To set 
up a shareable include file, you must create an .h file and enter the re- 
spective code as indicated in Figure 1-4: 



( 



#ifdef LANGUAGE C 



C code 



#endif 

#ifdef LANGUAGE FORTRAN 



Fortran code 



#endif 

#ifdef LANGUAGE ASSEMBLY 



MIPS Assembly code 



#endif 



( 



Figure 1-4. 

NOTE: When you write your program, you need to include the " .h" file 
that you created. 



1.5 Link Editor 



This section gives summarizes the functions of the link editor and how it 
works. Refer to the ld{\) manual page in the IRIS-4D User's Reference 
Manual for complete information on the link editor options and libraries. 

The link editor combines one or more object files (in the order specified) 
into one program object file, performing relocation, external symbol reso- 
lutions, and all the other processing required to make object files ready 
for execution. Unless you specify otherwise, the link editor names the 



( 
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program object file a. out. You can execute the program object or use it 
as input for another link editor run. 

The link editor supports all the standard command line features of other 
UNIX system link editors except System V ifiles. (An ifile holds a descrip- 
tion of a load module.) 



1.5.1 Running the Link Editor 

You can run the link editor by typing Id on the command line of your 
shell or by using one of the driver commands as described in this chapter 
in the section "Linking Objects". In most cases, the driver commands 
should be used to call the link editor so that the object is linked with sys- 
tem libraries and important link editor switches. The syntax of the Id 
command is as follows: 

Id -options object! [ object2. . .objectn ] 

NOTE: The assembler driver as does not run the link editor. To link 
edit a program written in assembly language, do either of the following: 

• Assemble and link edit using one of the other driver commands (cc, 
for example). The .s suffix of the assembly language source file 
causes the driver to invoke the assembler procedures. 

• Assemble the file using as, then link edit the resulting object file with 
the Id command. 



1.5.2 Specifying Libraries 

If you compile multi-language programs, be sure to explicitly load any re- 
quired runtime libraries. For a list of the libraries that a language uses, 
see the manual pages for cc(l) and /77(1) in the IRIS-4D User's Refer- 
ence Manual. 

You may need to specify libraries when you use UNIX system packages 
that are not part of a particular language. Most of the manual pages for 
these packages list the required libraries. 

NOTE: The link editor searches libraries in the order you specify. 
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1.5.3 Link Editor Options 



Table 1-1 summarizes the link editor options. Refer also to the list of 
general options earlier in this chapter and to the W(l) manual page in the 
IRIS-4D User's Reference Manual for complete information on options 
and libraries that affect link editor processing. 



Option Name 



( 



Purpose 



-b 



-Bnum 



-Bstring 



-d 



-Dnum 



~e epsym 



Tells Id not to merge symbolic information 
entries from the same file into one entry 
for that file. Use this option when a file 
compiled for debugging has variables with 
the same names but different attributes. This 
can occur when compiling two object files 
that use the same include file, and variables 
with the same name differ because of 
conditional statements within the file. 

Sets the starting address of the uninitialized 
data segment (bss) to the hexadecimal 
address num. This option is valid only 
when you've also specified the -N link 
editor option described later in this table. 

Appendstringio the libarary name created 
by the -Ix or kk option. 

Forces the definition of common storage and 
link editor-defined symbols even if -r 
is specified. 

Sets the starting address of the data segment 
(data) to the hexadecimal address num. 
This option is valid only when you've 
also specified the -N link editor option. 

Sets the default entry point address for 
the output file to the specified symbol 
epsym. 



( 



( 



Table 1-1. Link Editor Options 
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Option Name Purpose 



-f//7/ Sets the fill pattern for "holes" within an 

output section of an object file; //// is a four- 
byte hexadecimal constant that defines the 
fill pattern. 

-Gnum Specifies the maximum size (in decimal 

bytes) of a .comm item that should be 
allocated in the small uninitialized data 
(sbss) section for reference by the 
global pointer. 



-bestGnum calculates the best-G nam to use when compiling 
and linking. 

-count These options control which objects are counted as 

-nocount recompilable for the best -G num calculation. By 

-countall default, the -bestGnum option assumes that all 

files can be recompiled with a different ~G num 
option. If you can not recompile certain object 
files or libraries, use these options to tell the link 
editor so that it will calculate the proper-Gnwm value. 
-nocount states that the object file following it on 
the command line can not be recompiled; 
-countall overrides all -nocount 
options following it on the command line. 



Table 1-1. Link Editor Options (continued) 
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Option Nam© Purpose 



-Ix Specifies the name of a link library, where x 

is the library name. The link editor searches 
for libx.a in /lib, /usr/lib, and /usr/local/lib 
directories respectively. For example, if you 
specify /curses, the library pathnames can be: 

/lib/curses. a 
lusrlUb I curses M 
lusrllocallliblcurses. a 

If a library relies on procedures or data 
from another library, specify that library's 
name first. 

If a library resides in a directory other than 
/lib, /usr/lib, or /usr/local/lib, use the -L 
option to specify the appropriate directory 
for that library. 

-L dirname Searches dirname for libraries specified in 

the -1 option before searching directories 
I lib, /usr/lib, or /usr/local/lib. 

This option must precede the -1 option. 

-L If the link editor doesn't find the library 

in dirname, then /lib, /usr/lib, and /usr/local/lib 
are NOT searched. An-h dirname option 
must be specified with -L. 

-m Produces a link editor memory map 

in System V format. 

-M Produces a link editor memory map 

in BSD format. 



Table 1-1. Link Editor Options (continued) 
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Option Name Purpose 



-n * Creates an NMAGIC file. The text segment 

is read-only and shareable by all users 
of the file. 

_N * Creates an OMAGIC file. The text segment 

isn't readable and shareable by other users. 
The data segment follows immediately after 
after the text segment. 

-o filename Specifies a name for your object file. If you 

don't specify a name, the link editor uses 
a. out as the default. 

-pfile Preserve the symbol names listed in file when 

loading ucode object files. The symbol names 
in file are separated by blanks, tabs, or new 
lines. See Optimizing Frequently Used 
Modules in Chapter 2 for an example. 

""** Performs a partial link-edit, retaining 

relocation entries. This is required if 
the object is to be re-link edited with 
other objects in the future. The option 
causes the link editor not to define common 
symbols and to suppress messages on 
unresolved references. 

-s Strips Symbol table information from the 

program object, reducing its size. 
This option might be useful when linking 
routines that are frequently linked into other 
program objects. 

-T num Sets the origin for the text segment to the 

specified hexadecimal number. The 
default origin is OX 400000. The contents 
and format of the text segment are described 
in Chapter 9 of the Assembly Language 
Programmer's Guide. 

-u symname Makes symname undefined so that library 

components that define symname are 
loaded. 
See Chapter 9 of the Assembly Language Programmer's Guide for more 
information on NMAGIC and OMAGIC files. 

Table 1-1. Link Editor Options (continued) 
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Option Name Purpose 



-V Prints the name of each file as it is 

processed by the link editor. 

-V Prints the link editor version number. 

You might need this number, for example, 
when reporting a suspected bug in the link 
editor. 

~VS nam Puts the specified decimal version stamp 

numin the object file that the link editor 
produces. 

-X Retains external and static symbols in the 

Symbol table to allow some debugging 
facilities. Doesn't retain local (non-global) 
symbols. 

Table 1-1. Link Editor Options (continued) 



1.6 Object File Tools 

The following tools provide information on object files as indicated: 

• odump: lists the contents (including the symbol table and header in- 
formation) of an object file. 

• nm: lists only symbol table information. 

• file: provides descriptive information on the general properties of the 
specified file (for example, the programming language used). 



( 



( 



• size: prints the size of the text, data, rdata, sdata, bss, d^nd sbss sec- 
tions. The format of these sections is dc 
Assembly Language Programmer's Guide. 

The sections that follow describe these tools in detail. 



tions. The format of these sections is described in Chapter 9 of the £ 
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1.6.1 Dumping Selected Parts of Files (odump) 

The odump tool lists headers, tables, and other selected parts of an object 
or archive file. 

odump options filename 1 [fUname2. .filenamen] 

In the above syntax description, options is one or more of the options and 
suboptions listed in Tables 1-2 and 1-3; filename is the name of one or 
more object files whose contents are to be dumped. For more informa- 
tion, see the odump (1) manual page in IRIS-4D User's Reference Manual. 



Option Name Purpose 



-a 
-c 



Dumps the archive header of each member 
of the specified archive library file. 

Dumps the string table. 



-f Dumps each file header. 

-F Dumps the file descriptor table. 

"S Dumps the global symbols in the symbol 
table of an archive library file. 

-h Dumps the section headers. 

"■* Dumps the symbolic information header. 

_1 Dumps line number information. 

-o Dumps each optional header. 

-P Dumps the procedure descriptor table. 

-r Dumps relocation information. 

-R Dumps the relative file index table. 

-s Dumps the section contents. 

-t Dumps symbol table entries. 

Table 1-2. Main odump Options 
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Option Name 



Purpose 



-d number 



+d number 



-n name 



-P 

"t index 



+t index 



-v 



-z name, number 



Dumps the section number, or a range of 
section numbers, that starts at the specified 
number and that ends with the last section 
number or the number you specify with the 
+d auxiliary option. 

Dumps the sections in a range that starts with 
the first section or with the section you 
specify with the-d option. 

Dumps information only for the named entry 
name. Use this option with the -h, ~s, ~r, 
-1, and "t options. 

Suppresses the printing of headers. 

Dumps only the indexed symbol table entry. 
You can specify a range of table entries 
by using the +t option with the -t option. 

Dumps symbol table entries in a range that 
ends with the indexed entry. The range 
begins with the first symbol table entry or 
with the section that you specify with the 
-t option. 

Dumps information in symbolic rather than 
numeric representation (for example, in 
Static rather than 0X02) . Use this option 
with all dump options except -s. 

Dumps the line number entry or a range 
of entries that start at the specified 
number for the named function. 



( 



+z number 



Dumps the line number that starts at the 
function name or the number specified 
by the -z option and that ends at the 
the number specified by the +z option. 



( 



Table 1-3, Auxilliary odump Options 
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1.6.2 Listing Symbol Table Information (nm) 

The nm tool prints symbol table information for object files and archive 
files. 

nm options filename 1 [filename!, .filenamen ] 

In the above syntax description, options is one or more of characters 
(listed in Table 1-5) that specify the type of information to be printed; 
filename specifies the object file(s) or archive file(s) from which symbol 
table information is to be extracted. If you don't specify a file, nm as- 
sumes a. out. 

The meanings of the character keys shown in an nm listing are described 
in Table 1-4. 
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Key Description 



N Nil storage class, which avoids loading of 
unused external references. 

T External text. 

t Local text. 

D External initialized data. 

d Local initialized data. 

B External zeroed data. 

b Local zeroed data. 

A External absolute data. 

a Local absolute data. 

U External undefined data. 

G External small initialized data. 

S External small zeroed data. 

s Local small zeroed data. 

R External read-only data. 

r Local read-only data. 

C Common data. 

E Small common data. 

V External small undefined. 
Table 1-4. Character Key Meanings 



( 



( 



( 
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Option Name Purpose 



_ A Prints the listing in System V format. 

^ (Default). 

-B Prints the listing in BSD format. 



-a Prints debugging information (turns BSD 

output into System V format) . 

-b Prints the value field in octal. 

-d Prints the value field in decimal (the 

default for System V output) . 

-e Prints only external and static variables. 

-h Suppresses printing of headers. 

-n Sorts external symbols by name for System V 

format. Sorts all symbols by value for 
Berkeley format (by name is the BSD 
default output) . 

-o Prints value field in octal (System V output) . 

Prints the filename immediately before 
each symbol name (BSD output). 

""P Lists symbols in the order they appear in the 

Symbol table. 



-r 



Reverses the sort that you specified for 
external symbols with the -n and -v 
options. 



Table 1-5. Symbol Table Dump (nm) Options 
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Option Name Purpose 



-T Truncates characters in exceedingly long 

symbol names; inserts an asterisk as the 
last character of the truncated name. This 
may make the listing easier to read. 

~u Prints only undefined symbols. 

-Y Sorts external symbols by value (default 

for Berkeley format). 

~V Prints the version number of nm 

-x Prints the value field in hexadecimal. 

Table 1-5. Symbol Table Dump (nm) Options (continued) 



1.6.3 Determining a File's Type (file) 

The file tool lists the properties of program source, text, object, and other 
files. 

This tool often erroneously recognizes command files as C programs. For 
more information, see the file(l) manual page in the IRIS-4D User's Ref- 
erence Manual. 



Syntax: 

file fUenamel {filename 2.. fUenamen] 



i 



( 



Example: 



% file test.o a. out 

test .o:mipsel demand paged pure executable not stripped 

a. out: mipsel demand paged pure executable not stripped 

% 



( 
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1.6.4 Determining a File's Section Sizes (size) 

The size tool prints information about the text, rdata, data, sdata, bss, 
and sbss sections of the specified object or archive file(s). The contents 
and format of section data are described in Chapter 9 of the Assembly 
Language Programmers Guide. 



Syntax: 

size options [filenamel filename!,. filenamen] 

In the above syntax description, options is an alphabetic character (listed 
in Table 1-6) that specifies the format of the listing; filename specifies the 
object or archive file(s) whose properties are to be listed. If you don't 
specify a file, size assumes a. out. 

Below is an example of a size statement and the listing it produces. 

% size -B -o test.o 

text data bss rdata sdata sbss decimal hex 

test.o 31250 2010 40470 550 210 50 31232 7a00 

% size -B -d test.o 

text data bss rdata sdata sbss decimal hex 

test.o 12968 1032 166965 360 136 40 31232 7a00 

% 
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option Nam@ 



Purpose 



-A 

"B 

-d 
-o 

-V 



Prints data section headers in System V 
format. (Default) 

Prints data section headers in Berkeley 
format. 

Prints the section sizes in decimal. 
Prints the section sizes in octal. 

Prints the version of size that you're 
using. 

Prints the section sizes in hexadecimal. 



( 



Table 1-6. Size Options 



1 .7 Archiver 

An archive library is a file that contains one or more routines in object 
(.o) file format; the term object as used in this chapter refers to an .o file 
that is part of an archive library file. When a program calls an object not 
explicitly included in the program, the link editor (Id) looks for that object 
in an archive library. The editor then loads only that object (not the 
whole library) and links it with the calling program. 

The archiver (ar) creates and maintains archive libraries and has the fol- 
lowing main functions: 

• Copying new objects into the library. 

• Replacing existing objects in the library. 

• Moving objects about the library. 

• Copying individual objects from the library into individual object file. 

The sections that follow describe the syntax of the ar (archiver) command 
and some examples of how to use it. See the ar(l) manual page in the 
IRIS-4D User's Reference Manual for additional information. 



( 



( 
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Syntax: 

ar options [posObject] libName [ object I . . . object N ] 

The following explains the parameters in the above syntax description: 

• options is one or more characters (listed in Tables 1-7 and 1-8) that 
specify the action that the archiver is to take. When you specify more 
than one option character, group the characters together with no 
spaces between; don't place a dash (-) character before the option 
characters. 

• posObject is the name of an object within an archive library. It speci- 
fies the relative placement (either before or after posObject) of an ob- 
ject that is to be copied into the library or moved within the library. 
A posObject is required when the m or r options are specified to- 
gether with the a, b, or i suboptions. Example 4 below shows the use 
of a posObject parameter. 

• libName is the name of the archive library you are creating, updating, 
or extracting information from. 

• object is the name object (s) or object file(s) that you are manipulat- 
ing. 

1.7.1 Examples 

1. Create a new library and add routines to it. 

% ar cr libtest.a mcount.o monl.o string. o 

Options c suppresses archiver messages during the creation process. 
Options r creates the library libtest.a and adds mcount.o, monl.o, and 
string, o. 

2. Add or replace an object (.o) file to an existing library. 

% ar r libtest.a monl.o 

Option r replaces monl.o in the library libtest.a. If monl.o didn't 
already exist, the new object monl.o would be added. 
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CAUTION: If you specify the same file twice in an argument list, it 
appears twice in the archive. 

3. Update the library's symde f tsible. 

% ar ts libtest.a 

Option s creates the symdef table and t lists the table of contents. 

NOTE: After you create or change a library, you must always use the 
s option to update the symdef (symbol definition) table of the archive 
library. The link editor uses the symdef table to locate objects during 
the link process. 

4. Add a new file immediately before a specified file in the library. 

% ar rb mcount.o libtest.a new.o 

Option r adds new.o in the library Ubtest.a. Option b followed by 
posObject mcount.o causes the archiver to place new.o immediately 
before mcount.o. 



1.7,2 Archiver Options 

Table 1-7 lists the archiver options. You must specify at least one and 
only one of the following options: d, m, p, q, r, or x. In addition, you 
can optionally specify the c, 1, s, t, and v options, and any of the archiver 
suboptions listed in the following tables. 



( 



( 
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Option Name Purpose 



c Suppresses the warning message that the 

the archiver issues when it discovers 
that the archive you specified doesn't 
already exist. 

d Deletes the specified objects from the 

archive. 

1 Puts the archiver' s temporary files in the 

the current working directory. Ordinarily, 
the archiver puts those files in Itmp. 
This option is useful when Itmp is full. 

"^ Moves the specified files to the end of the 

archive. If you want to move the object 
to a specific position in the archive library, 
specify an a, b,or i suboption together with 
the posObject parameter. 

P Prints the specified object (s) in the 

archive on the standard output device 
(usually the terminal screen). 

q Adds the specified object files to the 

end of the archive. An existing object 
file with the same name isnot deleted, and 
the link editor will continue to use the old 
file. This option is similar to the r 
option (described below), but is faster. Use 
it when creating a new library. 

Table 1-7. Archiver Options 
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Option Name Purpose 



Adds the specified object files to the archive. 
This option deletes duplicate objects in the 
archive. If you want to add the object 
at a specific position in the archive library, 
specify an a, b,or i suboption together with 
the posObject parameter. See Example 4 in the 
preceding section for an example of using 
the posObject parameter. 

See also the u suboption. 

Use ther option when updating existing 
libraries. 

Creates disymdef table in the archive. You 
must use this option each time you create 
or change the archive library. 

At least one of the following options 

must be specified with thes option: m, p, q 

r, or t. 

Prints a table of contents on the standard 
output (usually the screen) for the 
specified object or archive file. 

Lists descriptive information during the 
process of creating or modifying the 
archive. When specified with thet 
produces a verbose table of contents. 

Copies the specified objects from the 
archive and places them in the current 
directory. Duplicate files are overwritten. 
Thelast modified date is the current date, 
unless you specify the o suboption. Then, 
the date stamp on the archive file is 
thelast modified date. 

If no object are specified, copies all the 
library objects into the current directory. 



( 



( 



( 



Table 1-7» Archiver Options (continued) 



1-34 Compiler Guide iRlS-4D System 



The archiver has these suboptions: 



Suboption Name Use with option. 



Purpose 



m or r 



m or r 



Specifies that the object file 

follow the posObject file you 
specify in the ar statement. 

^Specifies that the object file 
precede the posObject file you 
specify in the ar statement. 



m or r 



Same as -b 



Used when extracting a file 
from the archive to the 
current directory. Forces 
the last modified date of the 
extracted to match that of 
the archive file. 

The archiver replaces the 
existing object file when the 
last modified date is earlier 
(precedes) that of the new 
object file. 



1 
See Example 4 in the preceding section for an example of using 

the posObject parameter. 



Table 1-8. Archiver Suboptions 
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2. Improving Program Performance 



This chapter describes facilities that can help reduce the execution time of 
your programs; it contains the following major sections: 

• Profiling, which describes the advantages of the profiler and how to 
use it. The profiler isolates those portions of your code where execu- 
tion is concentrated and provides reports that indicate where you 
should devote your time and effort for coding improvements. 

• Optimization, which describes the compiler optimization facility and 
how to use it. The section also gives examples showing optimization 
techniques. 

• Limiting the Size of Global Pointer Data, which describes the global 
pointer area and how, through controlling the size of variables and 
constants that the compiler places in this area, you can improve pro- 
gram performance. 



2.1 Introduction 

The best way to produce efficient code is to follow good programming 
practices: 

• Choose good algorithms and leave the details to the compiler. 

• Avoid tailoring your work for any particular release or quirk of the 
compiler system. 

As technological advances cause MIPS to make changes to the current 
compiler system, anything you tailor now might negatively affect future 
program performance. Moreover, tailored code might not work at all with 
new versions of the system. To take action on possible compiler ineffi- 
ciencies, report them directly to MIPS. 
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2-2 Profiling 

This section describes the concept of profiling, its advantages and disad- 
vantages, and how to use the profiler. 



2.2.1 Overview 

Profiling helps you find the areas of code where most of the execution 
time is spent. In the typical program, execution time is confined to a 
relatively few sections of code; it's profitable to concentrate on improving 
coding efficiency in only those sections. The compiler system provides the 
following profile information: 

• pc sampling {pc stands for program counter) , which highlights the exe- 
cution time spent in various parts of the program. 

You obtain pc sampling information by link editing the desired source 
modules using the -p option and then executing the resulting program 
object, which generates profile data in raw format. 

• Invocation counting, which gives the number of times each procedure 
in the program is invoked. 

• Basic block counting, which measures the execution of basic blocks (a 
basic block is a sequence of instructions that is entered only at the 
beginning and which exits only at the end) . This option provides sta- 
tistics on individual lines. 

You obtain invocation counting and basic block counting information 
using the pixie program. Pixie takes your source program and creates 
an equivalent program containing additional code that counts the exe- 
cution of each basic block. Executing pixie and the equivalent pro- 
gram generate the profile data in raw format. 

Using the prof program, you can create a formatted listing of the raw pro- 
file data. The listings can indicate where to correct sub-optimal coding, 
substitute better algorithms, or substitute assembly language. The listings 
also indicate if your program has exercised all portions of the code. 

Figure 2-1 gives an example of a pc sampling listing produced from a pro- I 
gram compiled with the -p compiler option. The prof program produced ^ 
the listing from the raw profile data using the -procedure option. 



( 
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Procedures ^ PC Sampling 
Profiler option: ^f>roo6dur^ 



-pfi^ocedui'esr' using pc-saraplinig; 

sorted in descending order by total time spent in each procedure; 

unexecuted procedure^ eatoluded 



Kaoh saitfiple covers $>00 byte{s) for 4.2% ot Q.^400 seconds 



%tiiq0 



seconds ouip % oum seo procedure <fiXe) 



26.0 


0.0600 


2§.0 


16,7 


0.0400 


41.7 


12.5 


0.0300 


64.2 


12,5 


0.0300 


6e.? 


s.Ss. 


o.02bo 


7S,a 


4,2\ 


o,oiao 


179,2 


4,2 


V 0,0100 


83, a 


4,2 


\p,Ol0^j 


37,6 


4>Z 


a\oioA 


91,7 


4,2 


o>p^o\ 


06J^ 


4.^ 


o.Qioql 


ijxCo 



0.06 jmain (fi^tfgnt.p) 

0,10 write^strin^ (. ./textoutput.c) 

0.13 writejQhar (. ./textoutput.a) 

©.iBlwrite^integei" (. , /text out pi^t.o) 

0.18 \writ^(. ./write. a) 

0,19 Wtein. (, ,/'textoutput.o) 

rite^ohara ](, >/ie3ctoutput,j!) 
ead (. ,/read.s) 
Often i> ^ /open. 8) 
write (>./rewrite>c) 
In i> ,/te><tinput*c) 



.03 seconds (12.5% of execution time) 
was spent in writejnteger. 



.16 seconds (66.7% of total executiofj 
time) were spent oumatively in the 
main, wrlt©__strlng, wrlte^cliar, and 
writejnteger routines. 



The name of the source 
fiie for writejnteger Is 
../textoutput.c. 



Figure 2-1. Profiler Listing for PC Sampling 



Figures 2-2 through 2-6 show listings from raw data produced by pixie. 
The prof option used is given at the top of each figure. 
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Procedures — Invocation Counting 
Profiler option: —-pixie -invooation 



~i tnvocations] using basic-block counts; 

the called procedures are sorted in descending order by number of 

calls; a '?' in the columns marked '^calls' or 'line' means that data 

is unavailable because part of the program was compiled without 

profiling. 



( 



called procedure 


"Jfcails 


%calls from lir 


e» 
7 


calling procedure 


(file)":"" 


CllZleoln 


4017 


81.51 a 


ImSl^ix.p) 
main (pix.p) 


i 


^ 


4SS 


'"■■d":i9 """^ 


1 






428 


8.69 


main (pix.p) 






30 


0.61 


7 


main (pix.p) 


writ eschar 




4014 


81.76 


3 


main (pix.p) 






463 


9.23 


6 


main (pix.p) 






442 


9.00 


2 


main (pix.p) 






1 


q , 09. I 


7„ 


, mft i n y ni V n"\ 


eotn was catted 4017 times from 






The source code for main i$ 




line 37 of main This presented 81.S1% 


L_ — --Toxr" 


the file plK.p. 




of the calls to ©oin. 






OOK 


wri LO_s tring {, . / textoutput . c ) 
write^integer ( . . /textoutput . c) 
write_cardinal (.. /textoutput .c) 







0.00 267 







0.00 284 


write_real ( . . /textoutput . c) 







0.00 286 


write__real ( . . /textoutput . c ) 


write__string 


463 


24.59 31 


main (pix.p) 




453 


24.69 29 


main (pix.p) 




463 


24.59 31 


main (pix.p) 




453 


24.69 31 


main (pix.p) 




30 


1.63 23 


main (pix.p) 







0.00 189 


write__enum (.. /textoutput. c) 


write_integer 


463 


60.00 31 


main (pix.p) 




463 


60.00 31 


main (pix.p) 


eof 


463 


93.40 45 


main (pix.p) 




30 


6.19 23 


main (pix.p) 




1 


0.21 28 


main (pix.p) 




1 


0.21 14 


main (pix.p) 


writeln 


463 


93.60 29 


main (pix.p) 




30 


6.20 23 


main (pix.p) 




1 


0.21 47 


main (pix.p) 


readln 


463 


93.79 39 


main (pix.p) 




30 


6.21 21 


main (pix.p) 


sbrk 


4 


66.67 207 


morecore ( . . /malloc . c) 




1 


16.67 110 


malloc (.. /malloc. c) 




1 


16.67 115 


malloc (. . /malloc. c) 


close 


4 


100,00 108 


fclos© (. ./flsbuf.c) 


fflush 


4 


100.00 107 


fclose (. ./flsbuf.c) 







0.00 49 


„filbuf (. ./filbuf .c) 



( 



c 



Figure 2-2. Profiler Listing for Procecdure Invocations 
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Procedures — Basic Block Counts 
Profiler option: — pixl© -procedur® 



-p[rocedures] using basic-block counts; 

sorted in descending order by the number of cycles executed in each 

procedure; unexecuted procedures are excluded 



^^48137751 cycles 



> 



Total number of program oyol«s< 



cycles %cycles cum % 



32.45 
61.10 




cycles 
/call 

34 
42443603 
30 
23 
32 
133 



bytes procedure (file) 
/line 



Cumulative totals Le«, 92,$1% 
of all cycles used by wrlte^char, 
rnaln^ ao|n, and read^char. 



1 nn , on 



Cycles used by wrlte^char. 
in units and as a percema9^ 
of total program cycles. 



"TOTT" 

90 

82 

55 

36 

16 

13 

11 

6 

5 

5 



0.00 100.00 
0.00 100.00 
0.00 100.00 
0.00 100.00 
0.00 100.00 
0.00 100 
0.00 100 
0.00 100 
0.00 100 
0.00 100.00 



^^rite^har ( . . /t ext out pvr^ ^u^ 

26 maTrTlTixtonf kp) """*'*'"'"°^ 
44 eoln (. ./tex- input. 0) 

27 read_char (. /textinput .c) 

8 write_chars . . /textoutput.c) 
14 write_integel (../textoutput.c) 
ia,jjauJLa~string (../textoutput.c) 
(, ./t< xtinput.o) 
n (. ./ extoutput.c) 
./text: nput.c) 
f (. ./: Isbuf.c) 
xf^fin f ,, ^Iftfill ,cA - 




Statistics describe calls to 
wrlte„char complied from 
source file leKtoMtput.c. 



9 pad (../textoutput.c) 

T5*TSSis>i^ . . /reset . c) 
13 fopeji^. ./fopen.c) 
sBrk (. ./sbrk.s) 
rewrite (. ./rewrite. c) 
/fstat.s) 

) 



fstat 



fopen used an average of $Z cycles 
per call and 13 bytes per line. 



.s) 
5 creat (. . /stringargl.s) 



Figure 2-3. Profiler Listing for Procedures Based on Basic Blocks 

Counts 
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Procedures — Basic Block Counts (with clock time) 
Profiler option: — pixid—prooedurd —clock 



-p[rocedures] using basic-block counts; 

sorted in descendin^f order by the number of cycles executed in each 

procedure; unexecuted procedures are excluded 



148137751 cyckQS (18.5172 seconds at a.OOjwegahertz) 




cycles %cycles cum % secondeycl< 



48071708 

42443503 

26457933 

20662323 

4307932 

3678408 

1573868 

362700 

279002 

251152 

30283 

13391 

2923 

1356 

736 

116 

105 

90 



bytes procedure (file) 




/call line 



Th« profiler computes th« tlm# In 
seconds basi0d on the meqahortz 
machine speed specified In the 
>^oiQCk option. 

29 



c) 



t.c) 



13 
6 
6 



Thl$ U$tlrl9 contains the same Information 
as the listing shown In Figgre 4.3, except 
thdt this listing also contains the number 
of seconds spent In each procedure. 



11 


d.oo 


100 


00 


0.0000 


11 


6 


Q.OO 


100 


00 


0.0000 


6 


5 


0.00 


100 


00 


0.0000 


5 


5 


0.00 


100 


00 


0.0000 


5 



14 write3nte^er "(' .' !/text6utput . c ) 
16 write_string ( . . /textoutput . c) 
67 readln ( . . /textinput .c) 

30 writeln (.. /textoutput .c) 
44 eof (. ./text input. c) 
11 _flsbuf (. ./flsbuf.c) 

_refill (. . /refill. c) 

write (. . /write. s) 

read (../read.s) 
11 morecore (.. /malice. c) 

10 malloc (. ./malloc.c) 
9 pad (. ./textoutput. c) 

15 rese t (.. /reset. c) 
(. . /fopen.c) 

(, . /sbrk.s) 
ite (. . /rewrite. c) 

(. ./fstat.s) 
ty (. ./isatty.c) 

11 gtty (../gtty.c) 

6 ioctl (. . /simple. s) 
6 open (. . /stringargl. s) 
5 creat (.. /stringargl. s) 



( 



( 



Figure 2-4. Profiler Listing for Procedures Base<j on Basic Blocks 
Counts (with clock times) 



( 
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Heavy — Basic Block Counts 
Profiler option: — pixl© —heavy 



-h[8avy] using basic-block counts; * 

sorted in descending order by the number of cycles executed in each * 
line; unexecuted lines are excluded * 



p r o c e d ure^xliJ-a^ 



^j;ite_char ( . . /textoutput .c) 

eoTrT^f : rt^*iHiX.in^\t . a ) — — 

main (fixfont.pT 
read_char (.. /tJxt input .c) 
main (fixfont.pl 



Th0 fXiQ^t heavily u$©cl llr>^ 
js 120, located In proc#dur© 
writ^^ohar, compiled from 
source file toKtoutput.c. 



write_char ( . . /textoutput . c) 
eoln (.. /text input .c) 
read_char ( . . /textinput .c) 
main (fixfont.p) ^ 

write_chars (.. /textoutput .c) 
write_char (.. /textoutput .c) 
write_char (.. /textoutput .c) 
write_integer ( . . /textoutput . c) 
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Figure 2-5. Profiler Listing for Heavy Line Usage 
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Lines — Basic Biook Counts 
Profiler option: -^pixie —lines 



-l[ines] using basic-block counts; 

grouped by procedure, sorted by cycles executed per procedure; 
'?' means that because a procedure was compiled without profiling, 
we lack line number information for it 
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Figure 2-6. Profiler Listing for Line Infornnation 
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2.2.2 How Basic Block Counting Worl<s 

Figure 2-7 on the next pages gives the steps to follow in obtaining basic 
block counts. Details of the steps shown in the figure are as follows: 

1. Compile and link-edit. Do not use the —p option. For example: 

cc -c myprog.c 

cc -o myprog myprog.o 

2. Run the profiling program pixie. For example: 

pixie -o myprog. pixie myprog 

Pixie takes myprog and writes an equivalent program containing addi- 
tional code that counts the execution of each basic block. Pixie also 
generates a file (myprog.Addrs) that contains the address of each of 
the basic blocks. For more information, see the pixie (1) section in 
the IRIS-4D User's Reference Manual. 

3. Execute myprog. pixie, which was generated by pixie. For example: 

myprog. pixie 

This program generates the file myprog. Counts, which contains the 
basic block counts. 

4. Run the profile formatting program prof, which extracts information 
from myprog. Addr and myprog. Counts, and prints it in an easily read- 
able format. For example: 

prof -pixie myprog myprog.Addrs myprog. Counts 

NOTE: Specifying myprog.Addrs and myprog. Counts is optional; pixie 
searches by default for with names in having the formal 
program jiame.Addrs and program_name. Counts. 

You can run the program several times, altering the input data, and 
create multiple profile data files, if you desire. See the section Aver- 
aging Prof Results later in this chapter for an example. 

You can include or exclude information on specific procedures within 
your program by using the —only or —exclude prof options (Table 2-1). 
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Figure 2-7. How Basic Block Counting Works 
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2.2.3 Averaging Prof Results 

A single run of a program may not produce the typical results you require. 
You can repeatedly run the version of your program created by pixie, 
varying the input with each run, ; then, you can then use the resulting 
.Counts files to produce a consolidated report. For example: 

1. Compile and link-edit. Do not use the — p option. For example: 

cc -c myprog.c 

cc -o myprog myprog.o 

2. Run the profiling program pixie. For example: 

pixie -o myprog. pixie myprog 

This step produces the myprog. Addrs file to be used in Step 4, as well 
as the modified program myprog. pixie, 

3. Run the profiled program as many times as desired. Each time you 
run the program, a myprog. Counts file is created; rename this file be- 
fore executing the next sample run. For example: 

myprog. pixie < inputl > outputl 
mv myprog. Counts myprogl. Counts 
myprog. pixie < input2 > output2 
mv myprog. Counts myprog2. Counts 
myprog. pixie < inputs > outputs 
mv myprog. Counts myprogS . Counts 

4. Create the report as shown below. 

prof -pixie myprog myprog. Addrs myprog [12S] .Counts 

prof takes an average of the basic block data in the myprogl. Counts, 
myprogl. Counts, and myprogS. Counts files to produce the profile re- 
port. 
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2.2.4 How PC-Sampling Works 

Figure 2-8 gives the steps to follow in obtaining pc-sampling information. 
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Figure 2-8. How PC-Sampling Works 
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Details of the steps shown in Figure 2-8 are as follows: 

1. Compile and link-edit using the — p option. For example: 

cc -c myprog.c 

cc -p -o myprog myprog.. o 

Note that the -p profiling option must be specified during the link ed- 
iting step to obtain pc sampling information. 

2. Execute the profiled program. During execution, profiling data is 
saved in the profile data file (the default is mon.out). 

myprog 

You can run the program several times, altering the input data, and 
create multiple profile data files, if you desire. See the section Aver- 
aging Prof Results later in this chapter for an example. 

3. Run the profile formatting program prof, which extracts information 
from the profile data file(s) and prints it in an easily readable format. 

prof -procedure myprog mon.out 

For more information on prof, see the prof(l) section in the IRIS-4D 
User's Reference Manual. 

You can include or exclude information on specific procedures within 
your program by using the —only or —exclude profiler options (Table 
2-1). 



2.2.5 Creating Multiple Profile Data Files 

When you run a program using pc-sampling, raw data is collected and 
saved in the profile data file mon.out. If you wish to collect profile data 
in several files, or specify a different name for the profile data file, set the 
environment variable PROFDIR as follows: 

C Shell Bourne Shell 

setenv PROFDIR string PROFDIR = string ; export PROFDIR 

This causes the results to be saved in the file stringlpid.progname, where 
pid is the process id of the executing program and progname is its name 
as it appears in argv[0]\ string is the name of a directory you must create 
before you run the program. 
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2.2.6 Running the Profiler (prof) 

The profiler program converts the raw profiling information into either a 
printed listing or an output file for use by the compiler. To run the pro- 
gram, type in prof followed by the optional parameters indicated below: 

prof [options] [pname] { [profile Jilename . . . ] 1 
[p«flme. Addrs pname. Counts] } 

The prof parameters are summarized below: 

options is one of the keyword or keyword abbreviations shown in Table 
2-1. (You can specify either the entire name or the initial character of 
the option, as indicated in the table.) 

p/iam^ specifies the name of your program. The default file is a. out, 

prof He filename specifies one or more files containing the profile data 
gathered when the profiled program executed. If you specify more than 
one file, prof sums the statistics in the resulting profile listings. 

pname, Kddxs (produced by running pixie) and pnam^. Counts (produced 
by running the pixie-modified version of the program) . 

The prof program takes defaults for profile _Jilename as follows: I 

• If you don't specify profile _filename, the profiler looks for the 
mon.out file; if this file doesn't exist, it looks for the profile input data 
file(s) in the directory specified by the PROFDIR environment variable 
(see the preceding section Creating Multiple Profile Data Files). 

• If you don't Specify profile J'ilename, but do specify -pixie, then prof 
looks for pnam^. Addrs and pnam^. Counts and provides basic block 
count information if these files are present. 

You might wish to consider using the —merge option when you have more 
than one profile data file; this option merges the data from several profile 
files into one file. See of Table 2-1 for information on the —merge op- 
tion. 



( 
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Name 

-p[rocedures] 



-pixie 



-i[nvocations] 



-l[ines] 



Result 

Lists the time spent in each procedure. 
See Figure 2-3 for a sample output listing. 

Basic block counting. Indicates that information is 
to be generated on basic block counting, and that the 
program. Addrsand program. Counts files produced 
by pixie are to be used by default. 

See Figures 2-3 through 2-6 for sample output. 

Basic block counting. Lists the number of 
times each procedure is invoked. The -exclude 
and -only options described below apply to callees, 
but not to callers. 

See Figure 2-2 for sample output. 

Basic block counting. List statistics for each line 
of source code. 

See Figure 2-6 for sample output. 



-o[nly] Reports information on only the procedure 

procedure__name specified by procedure jiame rather than the 
entire program. You may specify more than 
one -o option. If you specify upper-case O, prof 
uses only the named procedure (s), rather than 
the entire program, as the base upon which it 
calculates percentages. 

-e[xclude] Excludes information on the procedure (s) 

procedure___name (and their descendants) specified by 

procedure jiame. If you specify upper-case E for 
Exclude, prof also omits that procedure from the 
base upon which it calculates percentages. The 
-exclude option overrides the -include option. 

-z[ero] Basic block counting. Prints a list of procedures that 

are never invoked. 
Table 2-1. Options for the Profile List Program (prof) 
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Name 



--h[eavy] 



-c[lock]n 



-t[estcoverage] 



Result 

Basic block counting. Same as the -lines option, 
but sorts the lines by their frequency of use. 

See Figure 2-5 for a sample output listing. 

Basic block countingLists the number of 
seconds spent in each routine, based on the CPU 
clock frequency n, expressed in megaHertz. 
If you omitn, it defaults to 8.0. Never use the 
default if the next argument program_name 
or profile__filename begins with a digit. 

See Figure 2-4 for a sample output listing. 

Basic block counting. Lists line numbers that 
contain code that is never executed. 



c 



-m[erge] filename This option is useful when multiple input files 

of profile data (normally in mon.out) are used. It 
causes the profiler to merge the input files into 
filename, making it possible to specify the name of 
of the merged file (instead of several file names) 
on subsequent profiler runs. 



( 



Table 2-1. Options for the Profile List Program (prof) (continued) 
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Name Result 

-q[uit] n Allows you to condense output listings by 

r .. 1 (w truncating unwanted lines. You can truncate 

by specifying n in three different ways: 
-q[uit] ncum% 

n n is an integer. All lines after n 

lines are truncated. 

n% n is an integer followed by the per- 

centage sign. All lines after the line 
containing n% calls in the %calls 
column are truncated. 

ncum% n is an integer followed by cum and a per- 
centage sign. All lines after the line 
containing ncum% calls in the cum% 
column are truncated. 

For example, to eliminate the lines in the shaded 
portion of the sample listing below, any one of the 
following could be specified: 



-prof 


-q4 






-prof 


-q 13% 






-prof 


-q 92cum% 


) 






calls %calls cum% 






48071708 


32.45 32.45 


6.0090 




42443503 


28.65 61.10 


5.3054 




26457936 


17.86 78.96 


3.3072 




20662326 


13.95 92.91 


2.5828 




4307932 


2.91 95.82 


0.5385 




3678408 


2,48 98.30 


0.4598 




1573858 


1.06 99.36 


0.1967 




362700 


0.24 99.61 


0.0453 




279002 


0.19 99.80 


0.0349 




251152 


0.17 99.97 


0.0314 




30283 


0.02 99.99 


0.0038 




13391 


0.01 100.00 


0.0017 




2923 


0.00 100.00 


0.0004 



Table 2-1 . Options for the Profile List Program (prof) (continued) 
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2.3 Optimization 

This section gives background on the compiler optimization facilities and 
describes their benefits, the implications of optimizing and debugging, and 
the major optimizing techniques. 



2.3.1 Overview 

Global Optimizer 

The global optimizer is a single program that improves the performance of 
C and FORTRAN object programs by transforming existing code into more 
efficient coding sequences. Although the same optimizer processes C and 
FORTRAN optimizations, it does distinguish between C and FORTRAN 
programs to take advantage of the different language semantics involved. 

Today, most compilers perform certain code optimizations, although the 
extent to which they perform these optimizations varies widely. The MIPS 
compilers perform more extensive optimizations compared with the average 
compiler available. These advanced optimizations are the results of the 
latest research into better and more powerful compiler techniques. 

The MIPS compiler performs both machine-independent and machine-de- 
pendent optimizations. MIPS machines and other machines with RISC 
architectures provide a better target for machine-dependent optimizations. 
This is because the low-level instructions of RISC machines provide more 
optimization opportunities than the high-level instructions in other ma- 
chines. Even optimizations that are machine-independent have been 
found to be effective on machines with RISC architectures. Although 
most of the optimizations performed by the global optimizer are machine- 
independent, they have been specifically tailored to the MIPS environ- 
ment. 

Benefits 

The primary benefits of optimization, of course, are faster running pro- 
grams and smaller object code size. However, the optimizer can also 
speed up development time. For example, your coding time can be re- 
duced by leaving it up to the optimizer to relate programming details to 
execution time efficiency. This frees you up to focus on the more crucial 
global structure of your program. Moreover, programs often yield op- 
timizable code sequences regardless of how well you write your source pro- 
gram. 
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( 



( 



optimization and Debugging 

Optimize your programs only when they are fully developed and debugged. 
Although the optimizer doesn't alter the flow of control within a program, 
it may move operations around so that the object code doesn't correspond 
to the source code. These changed sequences of code may create confu- 
sion when using the debugger. 

Loop Optimization 

Optimizations are most useful in program areas that contain loops. The 
optimizer moves loop-invariant code sequences outside loops so that they 
are performed only once instead of multiple times. Apart from loop-in- 
variant code, loops often contain loop-induction expressions that can be 
replaced with simple increments. In programs composed of mostly loops, 
global optimization can often reduce the running time by half. 

The examples in Figure 2-9 show the results of loop optimization. The 
source code below was compiled with and without the — O compiler opti- 
mization option: 



void 

left (a, distance) 

char a[] ; 

int distance; 

{ 

int j, length; 

length = strlen(a) - distance; 
for (j =0; 3 < length; j++) 

a[3] = a[j + distance] ; 
} 



Figure 2-9. Example of loop Optimization 



Figure 2-10 shows the unoptimized and optimized code produced by the 
compiler. Note that the optimized version contains fewer total instructions 
and fewer instructions that reference memory. Wherever possible, the 
optimizer replaces load and store instructions (which reference memory) 
with the faster computational instructions that perform operations only in 
registers. 
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Unoptimized: 

loop is 13 instructions long using 8 memory references. 



8 



$32: 

# 



sw 
ble 



Iw 

Iw 

addu 

Iw 

addu 

Ibu 

addu 

sb 

Iw 

addu 

sw 

Iw 

bit 



for (j=0; j<length; j++) 

$0, 36($sp) # j = 

$24, 0, $33 # length >= 



a[j] = a[j+distance] ; 

$25, 36($sp) # j 

$8, 44($sp) 

$9, $25, $8 

$10, 40($sp) 

$11, $10, $9 

$12, 0($11) 



$33: 



# distance 

# j+distance 

# address of a 

# address of a [j+distance] 

# a[j+distance] 
$13, $10, $25 # address of a[j] 
$12, 0($13) # a[j] 

# j 

# j+1 

# j++ 

# length 

# j < length 



$14, 36($sp) 
$15, $14, 1 
$15, 36($sp) 
$3, 32($sp) 
$15 , $3 , $32 



( 



( 



Optimized: 

loop is 6 instructions long using 2 memory references. 

# 8 for (j=0; j<length; j++) 

move $5, $0 # j = 

ble $4, 0, $33 # length >= j 

move $2, $16 # address of a[j] 

addu $6, $16, $17 # address of a [j+distance] 

a[j] = a [j+distance] ; 

Ibu $3, 0($6) # a[j+distance] 

sb $3, 0($2) # a[j] 

addu $5, $5, 1 # j++ 

addu $2, $2, 1 # address of next a[j] 

addu $6, $6, 1 # address of next a [j+distance] 

bit $5, $4, $32 # j < length 

# address of nexta [j+distance] 



$32: 

# 



$33: 



Figure 2-10. Unoptimized and Optimized Code 
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Register Allocation 

MIPS architecture emphasizes the use of registers. Therefore, register us- 
ages have significant impact on program performance. For example, 
fetching a value from a register is significantly faster than fetching a value 
from storage. Thus, to perform its intended function, the optimizer must 
make the best possible use of registers. 

In allocating registers, the optimizer selects those data items most suited 
for registers, taking into account their frequency of use and their location 
in the program structure. In addition, the optimizer assigns values to reg- 
isters so that their contents move minimally within loops and during proce- 
dure invocations. 

Optimizing Separate Compilation Units 

The optimizer processes one procedure at a time. Large procedures offer 
more opportunities for optimization, since more inter-relationships are ex- 
posed in terms of constructs and regions. However, because of their size, 
large procedures require more time than smaller ones. 

The uload and umerge phases of the compiler permit global optimization 
among separate units in the same compilation. Often, programs are di- 
vided into separate files, called modules or compilation units, which are 
compiled separately. This saves compile time during program develop- 
ment, since a change requires recompilation of only one compilation unit 
rather than the entire program. 

Traditionally, program modularity restricted the optimization of code to a 
single compilation unit at a time rather than over the full breadth of the 
program. For example, calls to procedures that reside in other modules 
couldn't be fully optimized together with the code that called them. 

The uload and umerge phases of the compiler system overcomes this defi- 
ciency. The uload phase links multi-compilation units into a single compi- 
lation unit. Then, umerge orders the procedures for optimal processing by 
the global optimizer (uopt). 
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2.3.2 Optimization Options 

Figure 2-11 on the next page shows the major processing phases of the 
compiler and how the compiler —On option determines the execution se- 
quence. The table belows summarizes the functions of each of the —O 
options. 

Option Result 

— 03 The ulink and umerge phases process the output from the 

compilation (C or FORTRAN) phase of the compiler, 
which produces symbol table information and the program 
text in an internal format called ucode. 

The ulink phase combines all the ucode files and symbol 
tables, and passes control to umerge. Umerge reorders the 
ucode for optimal processing by uopt. Upon completion, 
umerge passes control to uopt, which performs global op- 
timizations on the program. 

— 02 Ulink and umerge are bypassed, and only the global op- 

timizer (uopt) phase executes. It performs optimization 
only within the bounds of individual compilation units. 

— 01 Ulink, umerge, and uopt are bypassed. However, the code 

generator and the assembler perform basic optimizations in 
a more limited scope. 

— 00 Ulink, umerge, and uopt are bypassed, and the assembler 

bypasses certain optimizations it normally performs. 

NOTE: You should refer to the cc(l), /77(1) manual page, as applicable, 
in the IRIS-4D User's Reference Manual for details on the -03 option 
and the input and output files related to this option. 



( 
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Figure 2-11. Optimization Phases of the Compiler 
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2.3.3 Full Optimization (-03) 

This section provides examples using the -03 option. The examples given 
assume that the program /oo consists of three files: a.c, b.c, and a.c. 

To perform procedure merging optimizations (-03) on all three files, type 
in the following: 

% cc -03 -o foo a.c b.c c.c 

If you normally use the — c option to compile the .o object file, follow 
these steps: 

1. Compile each file separately using the — j option by typing in the fol- 
lowing: 



% cc -J a.c 
% cc -j b.c 
% cc -j c.c 

The — j option causes the compiler driver to produce a ,u file (the 
standard compiler front-end output, which is made up of ucode; 
ucode is an internal language used by the compiler) . None of the re- 
maining compiling phases are executed, as illustrated below. The fig- 
ure below illustrates the results after execution of the three commands 
shown above. 



( 



a.c b.c c.c 




a.u b.u c.u 



( 



Figure 2-12. 
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2. Enter the the following statement to perform optimization and com- 
plete the compilation process. 

% cc -03 -o foo a.u b.u c.u 
Figure 2-13 illustrates the results of executing the above statement. 
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Figure 2-13. 
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2.3.4 Optimizing Frequentiy Used IVloduies 

You may want to compile and optimize modules that are frequently called 
from programs written in the future. This can reduce the compile and 
optimization time required when the modules are needed. 

In the examples that follow, b.c and c.c represent two frequently used 
modules that you wish to compile and optimize, retaining all the necessary 
information to link them with future programs; future. c represents one 
such program. 

1. Compile b.c and c.c separately by entering the following statements: 



( 



23 

% cc -j b.c 

% cc -j c.c 

The — j option causes the front end (first phase) of the compiler to 
produce two ucode files b.u and c.u. 

2. Create manually a file containing the external symbols in ib.c and c.c 
to which future, c will refer. Each symbolic name must be separated 
by at least one blank. Consider the following skeletal contents of b.c 
and c.c shown in Figure 2-14. 



b.c 


foo() 

{ 


c.c 


x() 

{ 




D 

D 




D 
D 




} 




} 




bar() 
{ 




helpO 
{ 




D 
D 




D 
D 




} 




} 




zotO 
{ 




struct 
{ 




D 
D 




D 
D 




} 




} ddata; 




struct 
{ 




y() 
{ 




D 
D 




D 
D 




} work ; 




} 



( 



( 



Figure 2-14. 
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In this example, future. c will call or reference only foo, bar, x, ddata, 
and y in the b.c and c.c procedures. A file (named extern for this 
example) must be created containing the following symbolic names: 

foo bar x ddata y 

(The structure work, and the procedures help and zot are used inter- 
nally only by b.c and c.c, and thus aren't included in extern.) 

If you omit an external symbolic name, an error message is generated 
(see Step 4 below). 

3. Now, optimize the b.u and c.u modules (Step 1) using the extern file 
(Step 2) as follows: 

% cc -c -03 -kp extern b.u c.u -o keep.o 

In the — kp option, k designates that the link editor option p is to be 
passed to the ucode loader. 

Figure 2-15 illustrates Step 3. 
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b.u c.u 





Procedure Merg« 
(umerge) 




Global Optimizer 




Code Generator 



Assembler 



extern 

(hand-ereated 
symbol list file) 




keep.o 



( 



Figure 2-15. 



4. Create a ucode file and an optimized object code file (foo) ior fu- 
ture, c as follows: 

% cc -j future. c 

% cc -03 future. u keep.o -o foo 
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The following message may appear; it means that the code in future, c 
is using a symbol from the code in b,c or b.c that was not specified in 
the file extern. 

zot: multiply defined hidden external (should have been pre- 
served) 

Go to Step 5 if this message appears. 

5. Include zot, which the message indicates is missing, in the file extern 
and recompile as follows: 

% cc -03 -c -kp extern b.u c.u -o keep.o 
% cc -03 future. u keep.o -o foo 



2.3-5 Building a Ucode Object Library 

Building a ucode object library is similar to building coff{5) object library. 
First, compile the source files into ucode object files using the compiler 
driver option -j and using the archiver just as you would for coff(5) object 
libraries. Using the above example, to build a ucode library (libfoo.b) of 
a source file, type in the following: 

% cc -j a.c 
% cc -j b.c 
% cc -j cc 
% ar crs libfoo.b a.u b.u c.u 

Conventional names exist for ucode object libraries (lihx.b) just as they do 
for coff(5) object libraries (libx.a). 



2.3.6 Using Ucode Object Libraries 

Using ucode object libraries is similar to using coff(5) object files. To 
load from a ucode library, specify a -kbc option to the compiler driver or 
the ucode loader. For example, to load from the ucode library the file 
created in the previous example, type in the following: 

% cc -03 filel.u file2.u -klfoo -o output 

Remember that libraries are searched as they are encountered on the 
command line, so the order in which you specify them is important. If a 
library is made from both assembly and high level language routines, the 
ucode object library contains code only for the high level language routines 
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and not all the routines as the coff(5) object library. In this case, you 
must specify to the ucode loader both the ucode object library and the 
coff{S) object library, in that order to ensure that all modules are loaded 
from the proper library. 

If the compiler driver is to perform both a ucode load step and a final 
load step, the object file created after the ucode load step is placed in the 
position of the first ucode file specified or created on the command line in 
the final load step. 



2.3.7 Improving Global Optimization 

This section contains coding hints recommended to increase optimizing 
opportunities for the global optimizer (uopt) . You should read through 
the recommendations in this section and, where possible, apply them to 
your code. 



C and FORTRAN Programs 

Do not use indirect calls. Avoid indirect calls (calls that use routines or 
pointers to functions as arguments). Indirect calls cause unknown side 
effects (that is, change global variables) that can reduce the amount of 
optimization. 



C Programs Only 

Function return values. Use functions to return values instead of refer- 
ence parameters. 

Do while and repeat. Use do while (for C) instead of while or for 

when possible. For do while, the optimizer doesn't have to duplicate the 
loop condition in order to move code from within the loop to outside the 
loop. 

Unions and variant records. Avoid unions (in C) that cause overlap 
between integer and floating point data types. This keeps the optimizer 
from assigning the fields to registers. 

Use local variables. Avoid global variables. In C programs, declare any 
variable outside of a function as static, unless that variable is referenced 
by another source file. Minimizing the use of global variables increases 
optimization opportunities for the compiler. 
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( 



( 



( 



Value parameters. Use value parameters instead of reference parameters 
or global variables. Reference parameters have the same degrading effects 
as the use of pointers (see below). 

Pointers and aliasing. Aliases can often be avoided by introducing local 
variables to store dereferenced results. (A dereferenced result is the value 
obtained from a specified address.) Dereferenced values are affected by 
indirect operations and calls, whereas local variables aren't. Therefore, 
they can be kept in registers. Figure 2-16 shows how the proper place- 
ment of pointers and the elimination of aliasing lets the compiler produce 
better code. 



Consider the following example, which uses pointers. Because 
the statement *p++ = might modify len, the compiler cannot 
place it in a register for optimal performance, but Instead, 
must load it from memory on each pass through the loop. 

Source Code: 



int len = 10; 
char a [10] ; 

void 
zeroO 

{ 

char *p; 

for (p = a; p ! = 

} 



a + len; ) 




Generated Assenrtbly Code: 

# 8 for (p = a; p != a + len; ) *p++ = 0; 



Caw_ 
a3rdu~ 



$3, len 



$32: 






5^24, $4, $3 
beq $24, $4, $33 

sb $0, 0($2) 

■dn V2r , $ i 

Iw 
aHHu" 



bne 



$33: 



$ 25, le x, 
TSTTT; $25 



# P 



# a + len ! = 

# *p = 

# p++ 



# len + a != p 



Figure 2-16. 
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Two different methods can be used to increase the efficiency of this exam- 
ple: using subscripts instead of pointers and using local variables to store 
unchanging values. 

Using subscripts instead of pointers. The use of subscripting in the pro- 
cedure azero eliminates aliasing; the compiler keeps the value of len in a 
register, saving two instructions, and still uses a pointer to access a effi- 
ciently, even though a pointer isn't specified in the source code. See Fig- 
ure 2-17. 



Source Code: 






void 








azero 









{ 




X 


-— V 


int 


i; 


/ 


A 


for 
} 


(i = 0; 


i != len; i++A a 


[i]^ 0; 


Generated Assembly Code: 






for 


(i = 0; i != len 


i++) a[i] = 0; 




move 


$2, $0 


# i = 




beq 


$4, 0, $37 


# len != 




la 


$5 , a 




$36: 










sb 


$0, 0($5) 


# *a = 




addu 


$2, $2, 1 


# i++ 




addu 


$5, $5, 1 


# a++ 




bne 


$2, $4, $36 


# i != len 


$37: 









( 



Figure 2-17. 



Using local variables. Specifying len as a local variable or formal argu- 
ment (as shown in Figure 2-18) ensures that aliasing can't take place and 
permits the compiler to place len in a register. 



( 
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Source Code: 






char I 


^[10]; 






void 








Insaerr 


Tt*^ 






(int 


len;J 






char *p; 






for 


(p = a; p != a + len; ) *p++ 


= 0; 




} 








Generated Assembly Code: 






# 8 


for (p = a; p != a + len; ) 


*'p++ = 0; 






move $2, $6 


# p = a 






addu $5, $6, $4 








beq $5, $6, $33 


# a + len 


!= a 


$32: 










sb $0, 0($2) 


# *p = 






addu $2, $2, 1 


# P++ 






bne $5, $2, $32 


# a + len 


!= p 


$33: 









Figure 2-18. 



In the previous example, the compiler generates slightly more efficient 
code for the second method. 



C Programs Only 

Write straightforward code. For example, don't use ++ and — opera- 
tors within an expression. When you use these operators for their values 
rather than for their side-effects, you often get bad code. 

For example: 
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Bad Good 

while (n— ) { while (n != ) { 

• • n~ ; 



} 



} 



Use register declarations liberally. The compiler automatically assigns 
variables to registers. However, specifically declaring a register type lets 
the compiler make more aggressive assumptions when assigning register 
variables. 

Addresses. Avoid taking and passing addresses (& values). This can cre- 
ate aliases, make the optimizer store variables from registers to their home 
storage locations, and significantly reduce optimization opportunities that 
would otherwise be performed by the compiler. 

VARARGs. Avoid functions that take a variable number of arguments. 
This causes the optimizer to unnecessarily save all parameter registers on 
entry. 

2.3.8 Improving Other Optimization ( 

The global optimizer processes programs only when you explicitly specify 
the — 02 or — -03 option at compilation. However, the code generator 
and assembler phases of the compiler always perform certain optimizations 
(certain assembler optimizations are bypassed when you specify the —OO 
option at compilation) . 

This section contains coding hints that, when followed, increase optimizing 
opportunities for the other passes of the compiler. 



C and FORTRAN Programs 

1. Use tables rather than if-then-else or switch statements. 
For example: 



( 
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OK fVlore Efficient 

if ( i == 1) c = "1"; c = "01" [i] ; 

else c = "0"; 



2. As an optimizing technique, the compiler puts the first four parameters 
of a parameter list into registers where they remain during execution 
of the called routine. Therefore, you should always declare as the 
first four parameters those variables that are most frequently manipu- 
lated in the called routine with floating point parameters preceding 
non-floating point. 

3. Use word-size variables instead of smaller ones if enough space is 
available. This may take more space but it is more efficient. 



C Programs Only 

1. Rely on libc functions (for example, strcpy, strlen, strcmp, bcopy, 
bzero, memset, and memcpy). These functions were hand-coded for 
efficiency. 

2. Use the unsigned data type for variables wherever possible for the 
following reasons: (1) because it knows the variable will always be 
greater than or equal to zero (>=0), the compiler can perform op- 
timizations that would not otherwise be possible, and (2) the compiler 
generates fewer instructions for multiply and divide operations that use 
the power of two. Consider the following example: 

int i ; 
unsigned j ; 

return i/2 + j/2; 

The compiler generates six instructions for the signed i/2 operations: 



000000 


20010002 


li 


rl,2 


000004 


0081001a 


div 


r4,rl 


000008 


14200002 


bne 


rl,r0,0xl4 


OOOOOc 


00000000 


nop 




000010 


OSfeOOOd 


break 


1022 


000014 


00001812 


mflo 


r3 
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The compiler generates only one instruction for the unsigned jl2 op- 
eration: 



000018 0006C042 srl 



r24,r6,l # j / 2 



In the example, ill is an expensive expression; however, jl2 is inex- 
pensive. 



2.4 Limiting the Size of Giobal Pointer Data 



Global pointer data are constants and variables that the compiler places in 
the .sdata and .sbss portions of the data and bss segments shown in Fig- 
ure 2-19. This area is referred to as the global pointer area. 













:?^^^^t?«^5^^v^ 


Jj" text segment 




4rf 


►^ data segment 
» ^ bss segment 


^mmm. 


.sdata 


.sbss 


■■mmm 




□ Global pointer 


ar€ 


a 



( 



Figure 2-19. 



(The ,rdata, .data, and .sdata sections contain initialized data, and the 
.sbss and .bss sections reserve space for uninitialized data that is created 
by the kernel loader for the program before execution and filled with ze- 
ros. For more information on section data, see Chapter 9 of the Assem- 
bly Language Programmer's Guide.) 
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2.4.1 Purpose of Global Pointer Data 



In general, the compiler system emits two machine instructions to access a 
global datum. However, by using a register as a global pointer (called 
$gp), the compiler creates the 65536~byte global pointer area where a 
program can access any datum with a single machine instruction— only half 
the number of instructions required without a global pointer. 

To maximize the number of individual variables and constants that a pro- 
gram can access in the global pointer area, the compiler first places those 
variables and constants that take the fewest bytes of memory. By default, 
the variables and constants occupying 512 or fewer bytes are placed in the 
global pointer area, and those occupying more than 512 bytes are placed 
in the .data and .bss sections. 



2.4.2 Controlling the Size of Global Pointer Data 



The more data that the compiler places in the global pointer area, the 
faster a program executes. However, if the data to be placed in the global 
pointer area exceeds 65536 bytes, the link editor prints an error message 
and doesn't create an executable object file. For most programs, the 
512--byte default produces optimal results. However, the compiler pro- 
vides the -G option to let you change the default size. For example, the 
specification 

-G 8 

causes the compiler to place only those variables and constants that oc- 
cupy eight or fewer bytes in the global pointer area. 



2.4.3 Obtaining Optimal Global Data Size 



The compiler places some variables in the global pointer area regardless of 
the setting of the — G option. For example, a program written in assembly 
language may contain ,sdata directives that cause variables and constants 
to be placed into the global pointer area regardless of size. Moreover, the 
— G option doesn't affect variables and constants in libraries and objects 
compiled beforehand. To alter the allocation size for the global pointer 
area for data from these objects, you must recompile them specifying the 
— G option and the desired value. 



Version 1.0 Improving Program Performance 2-37 



Thus, two potential problems exist in specifying a maximum size in the 
— G option: 

• Using a value that is too small can reduce the speed of the program. 

• Using a value that is too large can cause more than the maximum 
65536 bytes to be placed in the data area, creating an error condition 
and producing an unexecutable object module. 

The link editor — bestGnum option helps overcome these problems by 
predicting an optimal value to specify for the —G option. The next sec- 
tions give examples of using the —bestGnum option and the related 
— nocount and —count options. 



2.4.4 Examples (Excluding Libraries) 

When using the —bestGnum option exclusive of —nocount and —count, 
the compiler driver assumes that you cannot recompile any libraries associ- 
ated with the program; the driver causes the link editor not to consider 
libraries when predicting the optimal maximum size. 

If you specify the option as shown below: 
cc -bestGnum bogus. c 

the compiler produces a message giving the best value for — G; if all pro- 
gram data fits into the global pointer area, a message indicates this. For 
example: 

All data will fit into the global pointer area 

Best -G num value to compile with is 80 (or greater) 

Because all data fits into the global pointer area, no recompilation is nec- 
essary. Consider the following example, which specifies 70000 as the 
maximum size of a data item to be placed in the global pointer area: 

CO bogus. c -G 70000 -bestGnum 

The above example produces the following messages: 

Too much data in the gp area, recompile all objects with a 

smaller -G num variable than 70000 

Best -G num value to compile with is 1024 



( 



( 



( 
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In this example, the link editor doesn't produce an executable load mod- 
ule and recommends a recompilation as specified below: 

cc bogus. c -G 1024 



2.4.5 Example (Including Libraries) 



You can explicitly specify that the link editor either include or exclude 
specific libraries in predicting the — G value. Consider the following exam- 
ple: 

cc -o plotter -bestGnum plotter. o -nocount libieee.a -count 
liblaser.a 

In the above example, the link editor assumes that libieee.a cannot be 
recompiled and will continue to occupy the same space in the global 
pointer area. It assumes that plotter. o and liblaser.a can be recompiled 
and produces a recommended — G value to use upon recompilation. 
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